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Given a sample from a discretely observed compound Poisson process, we consider estimation 
of the density of the jump sizes. We propose a kernel type nonparametric density estimator and 
study its asymptotic properties. An order bound for the bias and an asymptotic expansion of 
the variance of the estimator are given. Pointwise weak consistency and asymptotic normality 
are established. The results show that, asymptotically, the estimator behaves very much like an 
ordinary kernel estimator. 
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1. Introduction 

Let N(X) be a Poisson random variable with parameter A and let Yj.,}^, ... be a sequence 
of independent and identically distributed random variables that are independent of 
N(X), have a common distribution function F and have density /. Consider a Poisson 
sum of Ys: 

N(\) 

Assume A is known. The statistical problem wc consider is nonparametric estimation of 
the density / based on observations on X . Because adding a Poisson number of Ys is 
referred to as compounding, we refer to the problem of recovering the density / of Ys 
from the observations on X as decompounding. The problem of estimating the density / 
is equivalent to the problem of estimating the jump size density / of a compound Poisson 
process X' = (X' t ) t >o with intensity A when the process is observed at equidistant time 
points (rescaling if necessary, the observation step size can be taken to be equal to 1). 
Compound Poisson processes have important applications in queucing and risk theory 
(see, e.g., Embrechts et al. [7] and Prabhu [11]), for example, the random variables 
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Y\ , Y%, Y3, . . . can be interpreted as claims of random size that arrive at an insurance 
company or as the number of customers who arrive at a service point at random times 
with exponentially distributed interarrival time. 

The problem of nonparametric estimation of the distribution function F in the case 
of both continuous and discrete laws was treated by Buchmann and Griibel [1]. Their 
estimation method is based on a suitable inversion of the compounding operation (i.e., 
transition from the distribution of Y to the distribution of X) and use of an empirical 
estimator for the distribution of A, thus resulting in a plug-in type estimator for the 
distribution of Y. A further ramification of this approach in the case of a discrete law 
was given by Buchmann and Griibel [2]. To the best of our knowledge, the present paper 
is the first attempt to (nonparamctrically) estimate the density /. A very natural use of 
nonparametric density estimators is informal investigation of the properties of a given 
set of data. The estimators can give valuable indications about the shape of the density 
function, for example, such features as skewness and multimodality. The knowledge of 
these features might come in handy in applications, for example, in insurance, where / 
is a claim size density. 

One possible way to construct an estimator for the density / (suggested in Hansen and 
Pitts [9]) is via smoothing the plug-in type estimator F n of the distribution function F, 
that was defined by Buchmann and Griibel [1] , with a kernel, but at present no theoretical 
results for this estimator seem to be available. We opt for an alternative approach based 
on inversion of the characteristic function 0/, an approach that is in spirit similar to 
the use of kernel estimators in dcconvolution problems (the latter were first introduced 
by Liu and Taylor [10] and Stcfanski and Caroll [15]; for a more recent overview, see 
Wand and Jones [19]). Before we proceed any further, we need to specify the observation 
scheme. Zero observations provide no information on the Ys and, hence, an estimator of 
/ should be based on nonzero observations. In a sample of fixed size there are a random 
number of nonzero observations. We want to avoid this extra technical complication, so 
we assume that we have observations Xi , . . . , Xt„ on X, where T n is the first moment 
we get precisely n nonzero observations (T n of course is random) . We denote the nonzero 
observations by Z\, Z%, ■ ■ ■ , Z n . 

We turn to the construction of the estimator of the density /. First note that the 
characteristic function of X is given by 

E[e itJf ]=e- A+A *'W, 

where <p / denotes the characteristic function of a random variable with density /. Rewrite 
the characteristic function of X as 

(f>x(t) = c- x + (1 - e - A )^i-(e^« - 1). 

e A — 1 

Denote the density of X given N > by g. It follows that the characteristic function of 
X given N > is equal to 
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Because <pf vanishes at plus and minus infinity, so does <p g . By inverting the above 
relationship, we get 

^(t) = iLog((c A -l)0 g (t) + l). 

Here Log denotes the distinguished logarithm (in general, we cannot use a principal 
branch of the logarithm) and we refer to Chung ([4], Theorem 7.6.2) and Finkclcstcin et 
al. [8] for details of its construction. Notice that whenever A < log 2, the distinguished 
logarithm reduces to the principal branch of an ordinary logarithm. By Fourier inversion, 
if </>/ is integrable, we have 

f(x) = ^ J e~ itx Log((c A - l)<t> g {t) + 1) dt. (1.1) 

This relation suggests that if we construct an estimator of g (and hence of <f> g ) , we will 
automatically get an estimator for / by a plug-in device. Let if denote a kernel function 
with characteristic function <j) w and let h denote a positive number — the bandwidth. The 
density g will be estimated by the kernel density estimator 

1 -A 1 fx- Zj 

3=1 V 

Properties of kernel estimators can be found in recent books such as Devroye and Gyorfi 
[6], PrakasaRao [12], Tsybakov [16] and Wand and Jones [19]. The characteristic function 
4>g nh serves as an estimator of 4> g and is equal to (j>emp(t)<f) w (ht), where </f> em p denotes the 
empirical characteristic function 



1 ™ 



. XtZi 

y cmp V ' 



n ■ 

3=1 

In view of (1.1) it is tempting to introduce an estimator 
1 1"°° 

— J e- ite Log((e A - l^empW^u, (ht) + l)dt, (1.2) 

but there are two problems. First, the measure of those los from the underlying sample 
space fl for which the path (e A — l)4>g nh (t) + 1 can become zero is positive (although as 
n — > oo, this probability tends to zero) and the distinguished logarithm cannot be defined 
for such us. Second, there is no guarantee that the integral in (1.2) is finite. Therefore, 
we will make the adjustments 



fnh(x) = (M„ A f nh {x)) V (-M n ), 



(1.3) 
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where for those ojs for which the paths (e A — l)<$>cmp(t)4>w{ht) + 1 do not vanish, f n h is 
given by 

fnh(x) = ^-r e- ite Log((e A - l)4> cmp (t)4> w {ht) + l)di 

and is zero otherwise. Here M = (M„)„>i is a sequence of positive real numbers that 
converge to infinity at a suitable rate. We also assume that <fi w is supported on [—1,1]. 
Of course, for the truncation in (1.3) to make sense, f n h(%) must be real-valued, but this 
is easy to check through the change of the integration variable from t into —t. 

The rest of the paper is organized as follows. Section 2 contains the main results of the 
paper. In it we derive an order bound for the bias and an asymptotic expansion of the 
variance of f„h at a fixed point x, and we show that the estimator is weakly consistent 
and asymptotically normal. Section 3 provides some simulation results. All the proofs 
are collected in Section 4. 



2. Asymptotic properties of the estimator 

As is usual in nonparametric estimation, the nonparametric setting forces us to make 
some smoothness assumptions on the density /. Let /3, L\ and L 2 denote some positive 
numbers and let I = [(3\ denote the integer part of [3. If I = 0, then by definition set 
= /. Recall the definition of Holder and Nikol'ski classes of the functions (cf. Tsybakov 
[16], pages 5, 19). 

Definition 2.1. A function f is said to belong to the Holder class ?i(f3,Li) if its deriva- 
tives up to order I exist and verify the condition 

\f®(x + t)- fWWKLiltf- 1 . 

Definition 2.2. A function f is said to belong to the Nikol'ski class N{(3,L2) if its 
derivatives up to order I exist and verify the condition 



(/ (,) (» + *)- f (l \x)) 2 dx 



We formulate the condition on the density /. 



1/2 



<L 2 \t 



0-1 



Condition F. The density f belongs to H(f3,Li) flA/^/?,!^)- Moreover, t^<f>f is inte- 
grable and the derivatives /',..., f"> are integrable. 

The following lemma holds true. It is proved in Section 4. 

Lemma 2.1. Assume that Condition F holds. Then the density g belongs to H(f3,Li) n 
N{f3, Ae A (e A - iy 1 L 2 ). Moreover, t^<j) g {t) is integrable. 
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We will use this fact in the proofs of Propositions 2.1 and 2.2 and Theorems 2.1 and 
2.2. The requirement that g € Af(f3, Ae A (e A — l) -1 ^) is motivated by the fact that in 
the proofs we will make use of the expansion of the mean integrated squared error of a 
kernel density estimator g n h (cf. Tsybakov [16], page 21), while g £ Tl(f3, L\) is a standard 
condition in ordinary kernel density estimation (see Tsybakov [16], Proposition 1.2). The 
integrability of /',... , is used in the proof of Lemma 2.1. 

Definition 2.3. A function w is called a kernel of order I if the functions u 3 w(u), j — 
0, . . . , I, are integrable and verify the condition 



Because it is generally recognized that the choice of a kernel is less important for the 
performance of an estimator (see Wand and Jones [19], page 31), we feel free to impose 
the following condition on the kernel. 

Condition W. The kernel function w satisfies the following conditions: 

1. w is a bounded symmetric kernel of order I. 

2. The characteristic function <j) w has a support on [—1,1]. 



5. <p w is continuously differentiable. 

To get a consistent estimator, we need to control the bandwidth, so we impose the 
following restriction. 

Condition H. The bandwidth h depends on n and is of the form h = Cn^ 1 for < 7 < 1, 
where C is some constant. 

We also formulate the condition on the truncating sequence M = (M n )n>i (see Section 



Condition M. The truncating sequence M = (Af„)„>i is given by M n = n Q , where a 
is some strictly positive number. 

As the performance criterion, we select the mean squared error 





3- H^Iu^m)! du < 00. 
4. limi^oo \uw(u)\ = 0. 



1). 



MSE[jU(z)] =E[(f nh (x) - f(x)) 2 }. 



By standard properties of mean and variance 



USE[f nh (x)] = (E[f nh {x)] - f(x) f + Var[/ nh 



the sum of the squared bias and variance at x. 

First we study the behaviour of the bias of the estimator f n h{x). 
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Proposition 2.1. Suppose Conditions F, W, H and M are satisfied. Then the bias of 
the estimator f n h{x) admits an order bound 



E[f nh (x)]-f(x)=o(h^ + ^j. 



In ordinary kernel estimation, under the assumption g £ H.{(3, L\), the bias is of order 
hP (see Tsybakov [16], Proposition 1.2). We have an additional term of order (nh) -1 
that comes from the difficulty of the decompounding problem. Under standard conditions 
h — ► and nh — > oo, the bias will asymptotically vanish. 

Remark 2.1. If j3 = 2, then as in our technical report [17], it is possible to derive an 
exact asymptotic expansion for the bias. The leading term in bias expansion will be 



4itA y_„ (e»-l)*,(t) + l 



Now let us study the variance of the estimator f n h(x). 



Proposition 2.2. Suppose that apart from Conditions F, W, H and M, an additional 
condition nh 1+4/3 — > holds true. Then the variance of the estimator f n h(x) admits the 
decomposition 

Var[/ raft (x)] = J- (cA ~ 2 1)2 g (z) /°° («,(«))» dti + of^Y (2.1) 
n/i A z J-oo \nh J 

We see that the variance of our estimator is of the same order as the variance of an or- 
dinary kernel estimator (cf. Tsybakov [16], Proposition 1.4). Under the standard assump- 
tion nh — > oo, it will vanish. From a practical point of view, the restriction n/i 1+4/3 — > 
is not restrictive, especially in view of Proposition 2.3 given below. 

By combining Propositions 2.1 and 2.2, we get the following corollary. 

Corollary 2.1. Suppose Conditions F, W, H and M hold. The estimator f n h{x) is point- 
wise weakly consistent under the additional assumption nh 1+A P — > 0. 

Recall that the bandwidth h opt that asymptotically minimizes the mean squared error 
of a kernel estimator is called optimal. From Propositions 2.1 and 2.2 it is now possible 
to determine the order of the optimal bandwidth for the estimator f n h. 

Proposition 2.3. The optimal bandwidth /i opt is of order n~ 1 ^ 2 ^ +1 \ Furthermore, the 

mean squared error of the estimator f n h computed for the optimal bandwidth is of order 
n -2/3/(2/3+l)_ 

Note that the optimal bandwidth is of order n -1 ^ 2 ^ 1 ), just as in the case of ordinary 
kernel estimation. 
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Remark 2.2. When j3 = 2, then as in van Es et al. ([17], Proposition 3.3), it is possible 
to derive an exact expression for h op t- 

The extension of our results to the data-dependent bandwidth case is outside the scope 
of the present paper. 

It is interesting to verify whether our estimator is minimax. We refer to van Es et 
al. ([17], Theorem 3.1), where for f3 = 2 we proved that the minimax convergence rate 
for a quadratic loss function is at least n~ 2 / b and, that our estimator attains it for a 
fixed density /. This result can be easily generalized to an arbitrary (3 > 0. Whether the 
estimator itself is minimax is an open question. In any case, the results of the present 
section show that its behaviour is rather reasonable. 

Concluding this section, we will derive two asymptotic normality results for / n /j. 




Theorem 2.1. Assume that the Conditions F, W, H and M hold, and that the bandwidth 
h satisfies an additional condition nh 2/3+1 — > and g(x) ^ 0. Then 



'\w[f nh {x))J 
where N(0, 1) is the standard normal distribution. 



Asymptotic normality still holds if nh 2/3+1 — > C, where C is some constant, but in this 
case the limit will not be distribution-free; it will depend on the unknown function g. We 
cannot select an optimal bandwidth to obtain (distribution-free) asymptotic normality, 
but this is also the case in ordinary kernel estimation. This fact comes from the trade- 
off between bias and variance, for the details, see the proof of the theorem. Now let us 
consider a different centering: f n h{x) — E[/„/j(x)]. Then the following theorem holds true. 



Theorem 2.2. Suppose that Conditions F, W, H and M hold, g(x) ^ and nh 1+4 P — > 0. 
Then we have 

fnhjx) -~E[f nh (x)} \ ^ N ( Q 

y/Vax[f nh (x)} J 

We see that, in this case, the additional condition on the bandwidth is weaker than 
the one in Theorem 2.1. 
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3. Simulation results and numerical aspects 

In this section we present two simulations. They complement the asymptotic results 
of Theorems 2.1 and 2.2 and give some (although incomplete) indication of the finite 
sampling properties of the estimator. 

In the first example, the true density / is the standard normal density and A = 0.3. 
The kernel we used is from Wand [18] and it has the rather complicated expression 

_ 48t(t 2 - 15) cost - 144(2i 2 - 5) sini 
nt 7 

but its characteristic function looks much simpler and is given by 

K{t) = (l-t 2 fl m<l} . 

The estimator is based on 1000 observations and the bandwidth equals 0.14 (the band- 
width was selected by hand). To compute the estimator, we used the fast Fourier trans- 
form. The idea, which in spirit is close to the method for numerical evaluation of option 
prices proposed by Carr and Madan [3], is sketched as follows: 

(i) Notice that whenever A < log2, the distinguished logarithm in (1.2) reduces to the 
principal branch of the logarithm. 

(ii) The main use of truncation in (1.3) is to prove asymptotic properties of the esti- 
mator and, in general, we do not need to use it in practice. 

(iii) The computation of the empirical characteristic function can be significantly sped 
up by grouping the observations, the idea used to numerically evaluate ordinary kernel 
density estimators. However, we computed the empirical characteristic function directly, 
without grouping the observations. Notice that we do not use the values of the empirical 
characteristic function in its tails. 

(iv) Notice that we can rewrite (1.2) as f n h{x) = f„^{x) + fnh( x )> where 

1 f°° 

= J c_iteL °g(( eA - 1)<W*)<M^) + l)di, 

= c ite Log((e A - l)<f>emp(-t)(l>w(ht) + l)d*. 

Using the trapezoid rule and setting Vj = r/(j — 1), f^(x) can be approximated by 



1 N 



Here we take N to be some power of 2 and tp{ v j) = Log((e — ^)4'g n h( v j) + The 
application of the Fast Fourier Transform to this sum will give us N values of and 
we employ a regular spacing size S, so that our values of x are 

x u = — + 5{u - 1), 
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where u = 1, . . . , N. Thus we have 

i N 

3=1 

for w = 1, . . . , TV. To apply the Fast Fourier Transform, we note that we must take 5rj = 
2n/N. If we choose 7/ small to obtain a fine grid for integration, then we will obtain 
values of at values of x u that arc relatively seperate from each other. We would 
like, therefore, to obtain an accurate integration for larger values of 77: to this end we 
incorporate Simpson weightings into our summation, that is, 

1 N 

/$(*«) « ^E e " i ^ W_1)( "" 1) ^' / V(«j)f (3 + (-!)*- Vx), 

where <5j is a Kroncckcr function. Similar reasoning applies to f^(x). 

The result of this procedure for N = 16 384 and = 0.01 is given in Figure 1 (the 
estimate is represented by the bold dotted line). 

In the second example we consider the case when / is a mixture of two normal densities 
with means and 3/2 and variances 1 and 1/9 with mixing probabilities 3/4 and 1/4, 
respectively. The estimator is based on 1000 observations and the bandwidth equals 
0.1; the kernel is the same as in the first example. The result is given in Figure 2 (the 
estimate is plotted by the bold dotted line) . Note that the estimator captures the bimodal 
character of the density / in a quite satisfactory manner. 
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4. Proofs 



Proof of Lemma 2.1. We have \4> g (t)\ < C\<pf(t)\, which follows from the relationship 

= ^(e^-l). 

Indeed, 

| e A0 /(t) _ i| = | _i + i + \(j) f (t) + ---|< A|^ / (t)|e A| ^ ( * )l < \e x \<f> f (t)\ 

and \4> g (t)\ < C\4> f {t)\ follows, where C = Ac A (c A - This implies that t^<j> g {t) is 

integrable. Furthermore, 

oo 

g(x) = J2f* n (x) p (N = n\N>0), 

n=l 

where /*" denotes the n-fold convolution of /. By Parseval's theorem, 

POO 

(g {l) (x + t)-g^(x)) 2 dx= \u l (f> g {u)\ 2 \e itu ~l\ 2 du, (4.1) 



where we used the fact that \4> g m (u)\ = \u l (j) g (u)\ (see Schwartz [13], pages 180-182). The 
latter is true because the derivatives of g{x) up to order I are integrable, which can be 
verified by direct computation employing formula (III, 2;8) of Schwartz [13]. From (4.1) 
it follows that 



Ae 



A \ 2 



682 



B. van Es, S. Gugushvili and P. Spreij 



Applying Parseval's theorem to the right-hand side and recalling that / belongs to 
Af((3,Li2), we conclude that g belongs to A/"(/3, Ae A (e A — l) -1 ^)- Now we will verify 
that g £ H(j3, Li). We have 



g {l \x) = r (n_1) * f w (x)P(N = n\N > 0). 

n=l 

Using this expression, we get 
\g®(x + t)-g®(x)\ 

= J2P{N = n\N>0) I (/W(x + t-u)-/W(x~w))/*( n - 1) (w)dw 

0-1 



<L 1 \tf- l J2 p ( N ^ n \ N >°) / /* (n_1) (u)dw = ii|i 

— 1 J — 00 



This completes the proof of the lemma. □ 

Proof of Proposition 2.1. We may write 

b w {n,h,x) = E[/ n h(x)l[j n <,5] + f n h(x)l[j n> s\ - f(x)l[ Jn < S ]] ~ f{x)P(J n > S), 

where S is any positive number and J n denotes the integrated squared error of the 
estimator g n h. We have 

\E[f nh (x)l [Jn>s] }\ < M n P(J n > S). 

This term is of order lower than hr . To see this, recall the special form of M n and h, and 
apply the exponential bound to P(J n > 8) that is valid for all n sufficiently large (see 
Devroye [5], page 36, Remark 3). Also f(x)P(J n > 5) = o(h ). 
Now we turn to 

E[(/»fc(aO-/(i))l[j w < fl ]. 

By selecting <5, we can achieve that <j>g nh (t) is uniformly close to <j> g {t) on the set { J n < 5}. 
This is true because if J„ <S, then 

\^ 9nh (t)-Mt)\ = 



'(g n h(x) -g{x))dx 



<Jn<S. 



(4.2) 



This in turn implies that for 6 small (e.g., S — e /2), (e — l)4>g nh (t) + 1 is bounded 
away from zero on the set J n < S, because 



|(e A -l)<^(i) + l| = |e A ^«|> 
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Therefore, the distinguished logarithm will be well defined on this set and log(|(e A — 
l)4>g nh (t) + 1)|, that is, the real part of the distinguished logarithm Log((c A — ^)4> grlh (i) + 
1)) will be bounded on { J n < 5}. The imaginary part of Log((e A — l)</> 9llh (£) + 1) is also 
bounded. This holds true because, for t sufficiently large, (e A — l)4> g (t) + 1 is arbitrarily 
close to 1 and hence the argument of the distinguished logarithm Log((c A — l)(f> g (t) + 1) 
cannot circle around zero infinitely many times. To see the latter, we can argue as follows: 
there exists t* such that, for t>t*, (c A — l)(f> g (t) + 1 does not make a turn around zero, 
because as t — > oo, the function tends to e A . If we assume that (e A — l)<f>g nh (t) + 1 in [0,i*] 
makes an infinite number of turns around zero, then its length on [0, t*} must also be 
infinite (because the curve stays away from zero at a positive distance). One can check 
that under given conditions on w, the latter is not true and, hence, also (e A — l)^g„ h {t) + 1 
can make only a finite number of turns around zero. 

Thus on the set { J n < 5}, the argument of Log((e A — l)<p 3nh (t) + 1) will be bounded for 
S small and, hence, on the set { J n < 5} for large n and small S, the truncation becomes 
unimportant and we have f n h(x) = fnh{x). Therefore, 



E[(f nh (x)~f(x))l [Jn < s] } 





1/h 




— itx 




1/h 



The last two terms are of lower order than hP . Indeed, we have, for example, 





(4.3) 



Hence we need to study 




l/h 





2n\ 



J -1/h 



e~ ltx LogOsw^t) + l)dtl ti/ri < (5] 



(4.4) 
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where 



Znh (t) 



(c A -l)(^(i)-0 9 W) 



(e*-l)0 fl (t) + l 
Note that z n h is bounded. Rewrite (4.4) as 



1 

27tA 



E 



l/h 



l/h 



C Z nh (t) &tl [Jn < &] 



1 

2ttV 



-E 



l/h 



l/h 



c~ itx R nh (t)dtl [Jn < s] 



(4.5) 



where 



Rnh(t) = Log(l + z nh (t)) - z nh {t). 



Consider the first term in (4.5). We claim that the omission of l[j n <g] will result in an 
error of order lower than . In fact, 



E 



< 



l/h 
l/h 

E 



e ltx z nh (t)dtl [Jn < s] 

l/h 



l/h 



c z nh (t) dt 



E 



l/h 



l/h 



e- ltx z nh (t)dtl [Jn>s] 



The second term is bounded by Chr^P^Jn > S), where C is some constant, and this is 
of lower order than hr (recall the exponential bound of Devroye [5] on P(J n > 6)). 
Using the fact that E[^> cmp (£)] = <f> g (t), we obtain 



-E 



2n\ 



l/h 



At, (e A ~ l)(0c mp (£)<U^) - MtJl dt 



-l/h 



(e*-l)0 fl (i) + l 



1 



2nX 



l/h 

-l/h 
l/h 



3 H Mt)Mht)- <t> g {t) &t 



& - nx {<j) g {t)4> w {ht)-<j> g {t))dt 



-l/h 



2ttA 



i/h 



-l/h 



,(t)^(W)-</) s (i))(e- A ^(*)-l)di. 



(4.6) 



The first summand in the latter expression differs from the bias of the kernel estimator 
g n h(x) only by the absence of the term — h <j> g (t) dt — J^J h <j) g (t) dt. This additional 

term is of lower order than hr (cf. (4.3)). Under Conditions W and F and due to Lemma 
2.1, the bias of g n h{x) is of order h 13 (see Tsybakov [16], Proposition 1.2). As far as the 
second summand in (4.6) is concerned, it is dominated by 



2ttA 



l/h 



-l/h 



I <t>g (t)(j> w (ht) - (j) g (t) 1 1 <j)f (t) | dt, 



(4.7) 
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because 

| e -W) _i| < \e x \0 f (t)\. 

Application of the Cauchy-Schwarz inequality to the integral in (4.7) yields that it is 
bounded from above by 



l/h 



\(j>g{t)cp w (ht) - (j>g{t)\ 2 At • 



-l/h 



l/h 



-l/h 



\Mt)\ 2 dt. 



The second factor in this expression is bounded uniformly in h thanks to the fact that 
0/ is intcgrablc (\4>f(t)\ 2 consequently is also integrable). As far as the first factor is 
concerned, by Parseval's theorem it is bounded by the integrated squared bias of the 
estimator g n h, 



(g*w h (x) -g{x)) dx, 



where 



W h (x) = - W [ T 



Because, under Conditions F and W, the integrated squared bias of g n h is of order h 2/3 
(see Tsybakov [16], Proposition 1.8), we conclude that (4.6) is of order hr . This gives us 
the order of the leading term (4.6) in bias expansion. 
Now we turn to the second term in (4.5). We have 



E 



l/h 



-l/h 



-itx 



Rnh{t) dtl[j n < 5 ] 



<E 



l/h 



-l/h 



\Rnh{t)\ dtl[ Jn < S ] 



(4.8) 



To deal with this term we will need the inequality 

|Log(l + z nh {t)) - z nh {t)\ < \z nh {t)\ 2 , 
provided that |z n h(i)| < \. This inequality follows from the inequality 

|P/-1-Z|<2 2 , 

which is valid for \z\ < 1/2 if we take z = Log(l + z n h(t)), because by choosing n large 
enough and S small, J n < 5 will entail |z n fc(i)| < 1/2; see (4.2). Using the inequality (4.8), 
we obtain 



E 



l/h 



-l/h 



< E 



< KE 



\Znh{t)\ 2 dt 



\(/>emp(t)<l> w (ht) ~ <j> g (t)\ 2 dt 
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KE 



(g nh (t) - g{t)) 2 dt 



A"MISE„(», (4.9) 



where if is a constant. Here we used the fact that |(e A — l)(f> g (t) + 1| = e Xc ^ f ^ is bounded 
from below and applied Parseval's identity. Using the bound on MISE„(/i) (see Tsybakov 
[16], page 21) and combining it with (4.6), we establish the desired result. □ 

Proof of Proposition 2.2. Throughout the proof we will frequently use the follow- 
ing version of the Cauchy Schwarz inequality: if £ and 77 are random variables, then 
I Cov[£, 77] I < -y/Var[£] Var [77] provided that the variances exist. Hence, if the variance of 
77 is negligible in comparison to that of £, then Cov[£, 77] also will be negligible in compar- 
ison to Var[£] and, therefore, Var[£ + 77] ~ Var[£]; that is, the leading term of Var[£ + rj\ 
is Var[f]. 

Now we turn to the proof of the proposition itself. We have 

Var[/ nh (x)] = Var[/ nft (o;)l[j n < a ] + f nh (x)l [Jn>s] }. 

The variance of fnh(x)l[j n> s] is of lower order than (nh) , because of the special form 
of M„ = n a , the exponential bound on P(J„ > S) and the inequality 

Var[/ nh (x)l [Jn> a]] < E[(f nh (x)) 2 l [Jn>S] ] < M 2 n P( J n > S). 

Therefore, it suffices to consider Var[/ n / l (x)l[j ri <5]]. We have 

Vax[f nh (x)l [ j n < 5] \=Va,v[f nh (x)l [Jn < 5] - f(x)\ 

and because again the variance of f(x)l\j n> s] is of a lower order than (nft,) -1 , we can 
consider Vax[(f n h(x) — f( x ))^-[j n <s]] instead. As we have seen in the proof of Proposition 
2.1, on the set { J n < 6} for n large and 5 sufficiently small, f n h(x) = f n h(x) and the 
distinguished logarithm is well defined. Write 

Var[(/„fc(z)-/Or))l [i7B < 5] ] 



Var 



V27tA J_ x 



r i/h 1 r i/h 

/ e~ itx z nh (t) dt+— e-' ltx R nh (t) dt 
J-i/h ^J-i/h 



l/h 



e- itx 4> f {i)dt 



-l/h 

l/h 



C <t>f(t)dt)l [Jn < S] 



The variances of the last two terms are negligible. Indeed, we have, for example, 



Var 



-itx 



l/h 



l/h 



0/(t)dt 



^ f (t)dt 



[J n <S] 



Vax[l {Jn>s] }<CP(J n >S) 
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with some constant C. 

Hence we have to deal with 



Var 



(J- 



l/h 



1 



l/h 



l/h 



e- ltx z nh (t)dt+—J ^e- ltx Rn h (t)dt )l [Jn < Si 



: Var [I + 11]. 



(4.10) 



We show that II has a negligible variance compared to that of I. Indeed, using the bound 
(4.9) from the proof of Proposition 2.1, 



nh Var 



l/h 

e- itx R nh (t)dtl [Jn < s] 

-l/h 

< K 2 nhE[(lSE n (h)) 2 } 

= K 2 nh Var[ISE„(/i)] + K 2 nh(MISE n (h)) 2 , 

where K is a constant. Due to the conditions nh — > oo and nft, 1+4 ^ — ► 0, we see that 
n/i(MISE„(/i)) 2 tends to 0. 

We deal with nh Var[ISE„(/i)]. Let us write the integrated squared error as 



1 " f°° 1 

ise„(/ 1 ) = ^e/ oo W*)) 2 ^+^E-* 



Zj — Zk 



n poo / , y \ />oo 



using that 



t - Z, 



t-Zu 



dt = - 



Zj — Zk 
h 



because w is symmetric. Here w * w denotes the convolution of w with itself. From this 
it follows that 



n/iVar[ISE„(/i)] 
1 



i 3 h 



Var 



w * w 



Zj - z k 



3=1 J 



h 



g{t)dt 



(4.11) 



We study the variance of each term between the brackets in (4.11) separately. For the 
second term we have 



1 

n 3 h 



Var 



3=1 



w( t —^ L )g(t)dt 



688 



B. van Es, S. Gugushvili and P. Spreij 



4 

nh 



E Var / 



w 



t-Z, 



g(t)dt 



■ Var 



4- 



00 ft-zA , ' 



Through a change of the integration variable it is easily seen that 

't-Zi 



g(t)dt = h / w(u)g(uh + Zi) dw 



<hA \w(u)\du, 



(4.12) 



where we used the fact that g is bounded. Hence (4.12) vanishes as h — > 0. Now we arrive 
at the computation of the variance of the first term between the brackets in (4.11). We 
have 



1 



Var 



w * w 



Z j - Z k 



l 3 h 



Var 



w * w 

■i<k 



Zj - Z k 



7i ° h 

EE^ov 



Zi — Zj \ ( Zk — Z[ 

W *W\ ; I , W * W 



n 3 h 

i<j k<l 

We have three possibilities: 

1. i,j,k,l are distinct. Then, because of the independence, the corresponding covari- 
ances are 0. 

2. i = k,j = 1. The number of such possibilities is of order n 2 and because the covari- 
ances in (4.12) are bounded (because the convolution w * w is bounded), the sum 
of such terms will be of order n 2 . 

3. The last possibility is that three indices out of four are distinct, for example, i = 
k.j 7^ I. The number of such terms is of order n 3 . Thus we have to study the 
behaviour of for example, 



icov 
h 



w * w 



Z 



I Zi 
' ,w * w\ 
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Writing out this covariance yields 



-J-Cov 
h 

1. 



w * w 



Zi - Zj 



-E 



Zi — Z^ 



,w * w 



w * w 



i E 



4- 



w * w 



Z, — Z^ 



w * w 



Zi — Zi 
h 

Zi — Zi 
h 



Zi - Zi 



Note that because w is bounded, therefore w * w is also bounded and it is sufficient to 
study the behaviour of 



I- 



w * W 



Zi - Z, 



(4.13) 



To do this, first note that Zi — Zj has density 



m(x) 



g(t-x)g(t)dt. 



Using the change of variable formula and Fubini's theorem, we see that (4.13) can be 
written as 



w * w 



m{x) dx - 



oc />oc 



— oo J — oo 



W * W 



g(t — x)g(t) dx dt. 



Due to the fact that lim^i^^ = and applying the dominated convergence theo- 

rem, we conclude that this double integral converges to as h — » 0. Hence (4.11) tends 
to zero. Thus Var[II] is indeed negligible in comparison to Var[I]. 
Now we need to study (cf. (4.10)) 



Var 



1 



l/h 



Once again, applying the by now standard argument, instead of f^^ h , we take 
and substitute l[j n <<5] with 1, because the error will be of a lower order than (n/i) _1 . 
Furthermore, 



Var 



1 

2ttA 



c z nh (t)dt 



= V&r[A nh (x) + B nh (a 
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where 



A n h(x) 



B n h(x) 



c A - 1 
2ttA 



e- itx ((/) emp (t)(l) w (ht) -<l> g (t))dt: 



c A - 1 



(9nh(x) -g (x)), 



c A -l ' ao 



2ttA 



v cmp 



(t)^,(W)-0 9 (t))(e- A *'W-l)dt. 



For the variance of (x) we have the expansion 



Var[ 5nh (x)] = —g{x) / (w(t)) A dt + o[ — 



nh 



nh 



see Tsybakov ([16], Proposition 1.4). 

We will show that the variance of B n h{x) is of a smaller order than (nh)^ 1 . Indeed, 



ih Var[-B n /j(x)] = n/iVar 



c A - 1 
2?tA 



(e A - l) s 
(27tA) 2 



ft, Var 



- i *^- Zl )0 w (W)(e- A ^W - l)dt 



Now note that 



- ,i(l - Zl V«,(to)(c"^' (() - l)dt 



< 



| e -^/(*)_i|dt 



and that the right-hand side is finite thanks to the fact that <f>f(t) is integrable. Because 
we have Var[£] < K 2 , for a random variable |£| bounded by a constant K, we conclude 
that Vax[B nh (x)]=o(±). 

By combining all the intermediate results, we see that the leading term of the 
Var[/ nfe (x)] is 

1 (c A -l) 2 



□ 



nh A 2 9{X) J_J W{U)) du 
and that the other terms are of lower order than (n/i) _1 . 



Proof of Proposition 2.3. The result follows immediately from the decomposition 

MSE[iUx)] = Var[/ rl/l (x)] + (b w (n, h, x)) 2 
and Propositions 2.1 and 2.2. □ 



Proof of Theorem 2.1. The proof is based on repeated applications of Slutsky's the- 
orem (see Serfling [14], Section 1.5.4); that is, we will show that we can separate a 
sequence that gives asymptotic normality from our normalized sum and show that the 
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remainder term converges to zero in probability. Then Slutsky's theorem will imply that 
the normalized sum is itself asymptotically normal. Write 

fnhjx) - f(x) f nh {x) - f(x) Lh{x)-f{x) 

I ; = = —, ; 1 [J n <8] + —i= = 1 [J n >S]- ( 4 -14) 

yVax[f nh (x)] yVax[f nh (x)] yV&r[f nh (x)] 
If wc take n large and 5 small, then 

fnh{x) - f(x) fnh(x)-f(x) 

= =l[J n <f] - , ; = 1 [J„<g]- 

yVar[/ n/l (x)] yVar[/„h(x)] 
We treat the first term in (4.14). We have 

fnh(x) -f(x) 



[Jn<«] 

Vax[fnh(x)] 



1 M^<s](t-T I c-' ltx hog(l + z nh {t))At 

V3x[f nh (x)\ \ ZnX J-Hh 



2tt J 1 



1 1 



/ ; / 



/h 

l/h 



-^ ( j >f (t)dt-—J e-^^dtj 



1[4<s]zt e" lta; Log(l + z n/ ^))di 



/ ; '[Jn^l, / 

yVax[/ nh (o:)] 



1 1 



1[J»<«]^= / 1/H e- itx Mt)<&- (4-15) 
Let us denote the second and third expressions by I and II. We can write (4.15) as 



1 1 



l/h 

—itx i 



l [Jn<5\lZ=i / e- ltx Log(l + z nh (t))dt 



VVar[/„,(x)] ***J-i/h 

-(I-E[I])-(II-E[II])-E[I]-E[II]. 



The second and third terms of this expression converge to zero in probability. This follows 
from the application of Chebyshev's inequality and the facts that 



Var[l [Jn < d1 ] = Var[l [Jn>4] ] < P(J n >6)~e 



-Cn 
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Vax[f„h(x)\ ~ — . 

nn 

The application of Slutsky's theorem shows that we can neglect them. Now we take a 
further step and rewrite (4.15) as 



1 



I -- - A [J n <S]7n: / e Znh{t)dt 

yJVar[f nh (x)] lnX J ~V h 

i i r 1/h . 

+ / =1 [J.<<1TT / e- ltx R nh (t)dt-E[l + ll}. 

Denote the second term in this expression by III. Rewrite the above expression as 

•l/h 

<-l/h 

+ (III - E[III]) - E[I + II - III]. 

Again. (Ill — E[III]) converges to zero in probability and, therefore, we can neglect it. 
After doing so, we rewrite the above expression as 

1 1 f°° 



1 1 f 1/h _■ 

1 [J»<g]7TT / e z nh{t)dt 
V & r[f nh (x)} IUX J -^ h 



YBx[f nh {x)] 

1 



1 f°° _. 

- 1 [J, 1 <5]^-T / e Z nh (t)dt 

sJVar[f nh (x)] lnX J W 

i r 1/h 

l[j„<5] ^ J e- ltx z nh {t) dt - E[I + II - III] 



r-l/h 

'Vai[f nh {x)} 

Denote the second and third terms in this expression by IV and V. Then we can write 
1 1 f°° _■ 

Var[/„ h (x)] /nA J-oo 
- (IV-E[IV]) - (V-E[V]) -E[I + II-III + IV + V]. 

There is nothing random in IV and V except hj n <g\ ■ Due to Chebyshev's inequality, (IV — 
E[IV]) and (V — E[V]) converge to zero in probability and, therefore, can be neglected. 
We then have to deal with (recall the definition of z n h) 

i e A - A 

hj n <s\ — ^ — {9nh(x) -g(x)) 

Vai[f nh (x)} 
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+ (VI - E[VI]) - E[I + II - III + IV + V - VI] , 

where 
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VI: 



1 



Var[/„fc(x)] 



:1 



[J„<8] 



1 

2ttA 



e" ite (e A - l)(^ Bh (*) - fl (*))(e- A *'W - l)di. 



The argument from the proof of Proposition 2.2 shows that the variance of VI converges 
to zero and, hence, by Chcbyshcv's inequality, VI — E[VI] converges to zero in probability. 
Therefore, we can neglect it. Thus we have 

1 e A - A, , . ... 

l [Jn<$\ — T— (9nh[x) -g[x)) 



Now rewrite this as 



Vax[/„fc(a;)] 

E[I + II-III + IV + V- VI]. 

e A - A 

l [J n <$] — \ — (9nh(x) - E\g nh (x)}) 



Var[/ ra /i(x)] A 

(VII - E[VII]) - E[I + II - III + IV + V - VI - VII], 



(4.16) 



where 



VII = 



1 



^ x A 



Var[/ nft (x)] 



1 [j n <s] — t — (E[ffnh(a;)] -g{x))- 



Due to Chcbyshev's inequality, VII — E[VII] converges to zero in probability and, there- 
fore, can be neglected. The asymptotically normal term stems from the first term in 
(4.16), because hj n <s] — ► 1 in probability and because 



g n h(x) -E[g nh (x)} 
y/V&r[g n h(x)] 



7V(0,1), 



which can be verified along the lines of pages 61-62 of Prakasa Rao [12] by checking 
Lyapunov's condition. It is easy to see that 



E[I + II - III + IV + V - VI - VII] = E 



fnh(x) - f(x) 



[J„<«5] 



'Var[/ Bh (a:)] 

Adding the second term in (4.14) to this expression results in 

b w (n,h,x) fnh{x)l[j n> s) -E[f nh (x)l[j n>S \] f(x)l[j n> s\ - E[/(x)l[j n>(5 ]] 



Vax[fnh(x)] 



Var[/ n fc(a;)] 



V&r[fnh(x)} 
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The first term goes to zero because we assume that nh 2l3+1 — > 0. Two other terms converge 
to zero in probability. Thus, thanks to Slutsky's theorem, these terms can be neglected 
and we establish the desired result. □ 

Proof of Theorem 2.2. Write 

f„h(x) - E[f nh (x)] 

= (fnh(x) - f(x))l [Jn < s] + (f nh (x) - f(x))l [Jn>s] + (f(x) - E[jU(x)]). 

Using the same type of arguments as in Theorem 2.1 (note that we will not need nh 2 ^ +1 — > 
0, because the bias divided by the root of variance will be cancelled in intermediate 
computations), we see that we have to deal with 



l [J n >5] 



c - 1 gnh(x) - E[g nh (x)] _ fnhjx) - f{x) 
\J~Va,r[f nh (x)} y / Var[/ nft (x)] 
E[(f nh ( x)-f(x))l [Jn>5] } 

The first term gives asymptotic normality, while the last two terms tend to zero in 
probability. The application of Slutsky's theorem yields the desired result. □ 
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